Multilingual Plagiarism Detection
نویسندگان
چکیده
Cross lingual plagiarism detection has recently caught attention due to copy-right violations occurring in many fields such as education, journalism, scientific research, literature, screenplays, etc, where an author would translate an article in language L1 into language L2 and then either publish/submit it or change some of the sentences to suit his/her motivations. Therefore, the need for a robust method for cross lingual plagiarism detection arises. Most of the existing work on cross lingual plagiarism detection uses machine translation to translate the suspect document in L1 into L2 and then search for similar documents in L2. However, we argue that this approach suffers from the following limitations: (1) machine translation does not capture different writing styles that differ from field to another, (2) online machine translation that allows anonymous users to suggest better translations which suffers from tampering and incorrect suggestions, (3) the limited ability to identify different types of plagiarisms, for example two articles describing an accident might be labeled as plagiarized although they originated from different sources. Therefore, we propose an approach that will attempt to remedy the above three limitations by using machine learning and crowd sourcing techniques.
منابع مشابه
Multilingual Plagiarism Detection
Multilingual aspects have been gaining more and more attention in recent years. This trend has been accentuated by the global integration of European states and the vanishing cultural and social boundaries. Multilingual text processing has become an important field bringing a lot of new and interesting problems. This paper describes a novel approach to multilingual plagiarism detection. We prop...
متن کاملМетод выявления заимствований в текстах разноязычных документов (A Method of Automatic Plagiarism Detection in Multilingual Documents)
متن کامل
Cross-lingual Similarity Calculation for Plagiarism Detection and More - Tools and Resources
Agenda • EC-Joint Research Centre (JRC) – Who we are • Monolingual plagiarism detection (PD) work at the JRC • Cross-lingual similarity calculation at the JRC • Named entity (NE) matching across languages • Linking related news items across languages • Identifying translations of documents • JRC's multilingual tools and resources • Summary JRC-Who we are • European Commission (scientific-techni...
متن کاملCross-language plagiarism detection
Cross-language plagiarism detection deals with the automatic identification and extraction of plagiarism in a multilingual setting. In this setting, a suspicious document is given, and the task is to retrieve all sections from the document that originate from a large, multilingual document collection. Our contributions in this field are as follows: (i) a comprehensive retrieval process for cros...
متن کاملCross-Language Plagiarism Detection Methods
The present paper provides a summary on the existing approaches to plagiarism detection in multilingual context. Our aim is to organize the available data for the further research. Considering distant language pairs is of a particular interest for us. Cross-language plagiarism detection issue has acquired pronounced importance lately, since semantic contents of a document can be easily and disc...
متن کاملFuzzy-Semantic Similarity for Automatic Multilingual Plagiarism Detection
A word may have multiple meanings or senses, it could be modeled by considering that words in a sentence have a fuzzy set that contains words with similar meaning, which make detecting plagiarism a hard task especially when dealing with semantic meaning, and even harder for cross language plagiarism detection. Arabic is known by its richness, word’s constructions and meanings diversity, hence c...
متن کامل